WIP: Move DateTime native code into managed #21848

filipnavara · 2019-01-07T14:54:11Z

Just testing build and unit tests now. If it works out then the code could be shared between CoreCLR and CoreRT completely.

jkotas · 2019-01-07T16:06:03Z

Some of this is implemented as FCalls to avoid PInvoke overhead. This would need performance testing as well.

filipnavara · 2019-01-07T16:25:08Z

Some of this is implemented as FCalls to avoid PInvoke overhead. This would need performance testing as well.

I am aware of that. The history of the repository shows that has been done in other places. However, the BOTR explicitly states

If the only reason you're defining a FCall method is to call a native Win32 method, you should be using P/Invoke to call Win32 directly.

Maybe it's time to have some updated guidance on that.

filipnavara · 2019-01-07T16:39:38Z

@dotnet-bot test Windows_NT arm Cross Debug Innerloop Build
@dotnet-bot test Windows_NT arm64 Cross Debug Innerloop Build
@dotnet-bot test Windows_NT x64 Release CoreFX Tests

filipnavara · 2019-01-08T03:45:37Z

I've done some preliminary benchmarks and it shows some rather interesting things. Firstly, my initial implementation with LazyInitializer and delegates is about 2x slower than the baseline. Switching to an approach with statically initialized s_systemSupportsPreciseSystemTime variable and simple if condition cuts down the difference quite a lot:

Method	Mean	Error	StdDev
UtcNow [Old]	25.32 ns	0.4442 ns	0.3938 ns
UtcNow [New]	30.75 ns	0.6206 ns	0.5805 ns

Now, these numbers are not completely representative since it was run on .NET Core 2.1.6 runtime with a code copied to a separate BenchmarkDotNet project. Perhaps there's a better way to benchmark something like this?

Another interesting observation is that the GetSystemTimePreciseAsFileTime is nearly twice as slow on my system as GetSystemTimeAsFileTime.

Suchiman · 2019-01-08T08:42:05Z

@filipnavara

Another interesting observation is that the GetSystemTimePreciseAsFileTime is nearly twice as slow on my system as GetSystemTimeAsFileTime.

That seems to be expected: #9736

filipnavara · 2019-01-08T10:29:08Z

@Suchiman Thanks, I missed that remark in the PR.

Looks like .NET Core 2.1.6 doesn't properly optimize the static readonly bool variable trick. I will retry the benchmarks on newer runtime.

src/System.Private.CoreLib/src/System/DateTime.Windows.cs

filipnavara · 2019-01-08T11:33:24Z

I had some flaws in my benchmark. This is current state with .NET Core 3.0 Preview 1 runtime:

Method	Mean	Error	StdDev
UtcNow [Old]	25.34 ns	0.1404 ns	0.1313 ns
UtcNow [New]	27.81 ns	0.2964 ns	0.2628 ns

Is this worth pursuing further? Maybe there are ways to squeeze the last few nanoseconds back.

danmoseley · 2019-01-08T19:37:51Z

@filipnavara UtcNow is often on the hot path, yes I think it is worth investigating further perf improvements.

filipnavara · 2019-01-08T19:41:03Z

Well, I can scale back the PR and keep the FCalls and move just the rest to shared partition. I am not sure if I will be able to squeeze any more performance out of the P/Invoke approach.

jkotas · 2019-01-09T06:55:36Z

The performance is closer than what I thought it is going to be. I will take a look at this in detail when I get a chance.

filipnavara · 2019-01-09T08:16:51Z

Thanks! Any advice or help is appreciated.

filipnavara · 2019-01-09T11:07:08Z

There are some particularly bad things in the code gen, which could likely be eliminated:

00007ff8`c8b1f667 488945b0        mov     qword ptr [rbp-50h],rax
00007ff8`c8b1f66b eb00            jmp     00007ff8`c8b1f66d // Jumping to next instruction, really?
00007ff8`c8b1f66d 488b45b0        mov     rax,qword ptr [rbp-50h] // Loading whatever we wrote there two instructions ago

mikedn · 2019-01-09T12:00:41Z

There are some particularly bad things in the code gen, which could likely be eliminated:

Hmm, can you post more code? I've seen such jumps before but they're not common.

filipnavara · 2019-01-09T12:03:43Z

@mikedn I extracted the PInvoke alone into a small function

        public unsafe void JustPInvoke()
        {
            long timestamp;
            Interop.Kernel32.GetSystemTimePreciseAsFileTime(&timestamp);
        }

and looked at the generated code (.NET Core 3.0.0-preview-27122-01 (CoreCLR 4.6.27121.03, CoreFX 4.7.18.57103), 64bit RyuJIT):

; DateTimeTest.DateTimeBenchmark.JustPInvoke()
       push    rbp
       push    r15
       push    r14
       push    r13
       push    r12
       push    rdi
       push    rsi
       push    rbx
       sub     rsp,78h
       lea     rbp,[rsp+0B0h]
       xor     edx,edx
       mov     qword ptr [rbp-40h],rdx
       mov     qword ptr [rbp-48h],rdx
       mov     qword ptr [rbp+10h],rcx
       lea     rcx,[rbp-80h]
       mov     rdx,r10
       call    coreclr+0xad090
       mov     qword ptr [rbp-48h],rax
       mov     rcx,rsp
       mov     qword ptr [rbp-60h],rcx
       mov     rcx,rbp
       mov     qword ptr [rbp-50h],rcx
       lea     rcx,[rbp-40h]
       xor     r11d,r11d
       mov     rax,7FF89F1F39D8h
       mov     qword ptr [rbp-70h],rax
       lea     rax,[00007ff8`9f0affde]
       mov     qword ptr [rbp-58h],rax
       mov     rax,qword ptr [rbp-48h]
       lea     rdx,[rbp-80h]
       mov     qword ptr [rax+10h],rdx
       mov     rax,qword ptr [rbp-48h] ; This is loading a value that was already loaded 4 lines above
       mov     byte ptr [rax+0Ch],0
       call    qword ptr [00007ff8`9f1f3b70]
       mov     rdx,qword ptr [rbp-48h]
       mov     byte ptr [rdx+0Ch],1
       cmp     dword ptr [coreclr!g_CLREngineMetrics+0x9080
       je      M00_L00
       call    qword ptr [coreclr!CreateCLRProfiling+0x20b188
M00_L00:
       mov     rax,qword ptr [rbp-48h]
       mov     rdx,qword ptr [rbp-78h]
       mov     qword ptr [rax+10h],rdx
       jmp     M00_L01 ; Unnecessary jump
M00_L01:
       mov     rax,qword ptr [rbp-48h] ; Unnecessary load (probably not eliminated because of the jump?)
       mov     byte ptr [rax+0Ch],1
       lea     rsp,[rbp-38h]
       pop     rbx
       pop     rsi
       pop     rdi
       pop     r12
       pop     r13
       pop     r14
       pop     r15
       pop     rbp
       ret
...
; Total bytes of code 204

It could very well be something screwed up by BenchmarkDotNet or my .NET Core installation. Smallest repro code here:

using System;
using System.Runtime.InteropServices;

public class C {
        [DllImport("kernel32", ExactSpelling = true, SetLastError = false)]
        internal static unsafe extern void GetSystemTimePreciseAsFileTime(long* lpSystemTimeAsFileTime);

    	public unsafe void JustPInvoke()
        {
            long timestamp;
            GetSystemTimePreciseAsFileTime(&timestamp);
        }
}

AndyAyersMS · 2019-01-09T16:29:56Z

You are probably looking at unoptimized tier-0 code; here's what I see for your repro.

; Assembly listing for method C:JustPInvoke():this
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; optimized code
; rbp based frame
; partially interruptible
; Final local variable assignments
;
;* V00 this         [V00    ] (  0,  0   )     ref  ->  zero-ref    this class-hnd
;  V01 loc0         [V01    ] (  1,  1   )    long  ->  [rbp-0x40]   do-not-enreg[X] must-init addr-exposed ld-addr-op
;  V02 OutArgs      [V02    ] (  1,  1   )  lclBlk (32) [rsp+0x00]   "OutgoingArgSpace"
;  V03 FramesRoot   [V03,T00] (  7,  7   )    long  ->  rsi         "Pinvoke FrameListRoot"
;  V04 PInvokeFrame [V04    ] (  8,  8   )     blk (64) [rbp-0x80]   do-not-enreg[X] addr-exposed "Pinvoke FrameVar"
;
; Lcl frame size = 104

G_M55676_IG01:
       push     rbp
       push     r15
       push     r14
       push     r13
       push     r12
       push     rdi
       push     rsi
       push     rbx
       sub      rsp, 104
       lea      rbp, [rsp+A0H]
       xor      rdx, rdx
       mov      qword ptr [rbp-40H], rdx

G_M55676_IG02:
       lea      rcx, bword ptr [rbp-78H]
       mov      rdx, r10
       call     CORINFO_HELP_INIT_PINVOKE_FRAME
       mov      rsi, rax
       mov      rcx, rsp
       mov      qword ptr [rbp-58H], rcx
       mov      rcx, rbp
       mov      qword ptr [rbp-48H], rcx
       lea      rcx, bword ptr [rbp-40H]
       xor      r11, r11
       mov      rax, 0xD1FFAB1E
       mov      qword ptr [rbp-68H], rax
       lea      rax, G_M55676_IG04
       mov      qword ptr [rbp-50H], rax
       lea      rax, bword ptr [rbp-78H]
       mov      qword ptr [rsi+16], rax
       mov      byte  ptr [rsi+12], 0

G_M55676_IG03:
       call     [C:GetSystemTimePreciseAsFileTime(long)]

G_M55676_IG04:
       mov      byte  ptr [rsi+12], 1
       cmp      dword ptr [(reloc)], 0
       je       SHORT G_M55676_IG05
       call     [CORINFO_HELP_STOP_FOR_GC]

G_M55676_IG05:
       mov      rax, bword ptr [rbp-70H]
       mov      qword ptr [rsi+16], rax

G_M55676_IG06:
       mov      byte  ptr [rsi+12], 1

G_M55676_IG07:
       lea      rsp, [rbp-38H]
       pop      rbx
       pop      rsi
       pop      rdi
       pop      r12
       pop      r13
       pop      r14
       pop      r15
       pop      rbp
       ret      

; Total bytes of code 157, prolog size 30 for method C:JustPInvoke():this

BDN should be getting you to tier-1 code, but you can force the issue via COMPlus_TieredCompilation=0.

filipnavara · 2019-01-09T16:45:56Z

@AndyAyersMS Yeah, there's BDN bug filled for that (dotnet/BenchmarkDotNet#899). I was seeing some weird output with COMPlus_TieredCompilation=0 earlier, but likely due to some other mistake I made. Your output is way more readable than whatever I managed to get locally.

Suchiman · 2019-01-09T17:36:03Z

@filipnavara you can get that Output via https://github.com/dotnet/coreclr/blob/master/Documentation/building/viewing-jit-dumps.md

stephentoub · 2019-02-06T17:15:07Z

Another interesting observation is that GetSystemTimePreciseAsFileTime is nearly twice as slow

Yeah, we decided in #9736 that the performance difference was worth it for the accuracy.

filipnavara · 2019-02-11T20:46:23Z

Closing for now, thanks for all the valuable feedback. The Unix part was superseded by #22383 and the P/Invoke performance improvements are discussed there and in #22320.

filipnavara · 2019-10-24T11:58:51Z

Note to myself: This should be resurrected now that SuppressGCTransition attribute is available.

Move DateTime native code into managed

fe5e966

filipnavara force-pushed the datetime1 branch from ceedd41 to fe5e966 Compare January 7, 2019 15:02

Fix Unix build

0ba6db4

Performance optimization

f49e516

marek-safar reviewed Jan 8, 2019

View reviewed changes

src/System.Private.CoreLib/src/System/DateTime.Windows.cs Outdated Show resolved Hide resolved

Remove unused code

de56912

filipnavara closed this Feb 11, 2019

This was referenced Apr 9, 2019

[netcore] Make System.Private.CoreLib and host APIs compilable on Windows mono/mono#13943

Merged

Move DateTime Windows implementation to shared CoreLib #23888

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Move DateTime native code into managed #21848

WIP: Move DateTime native code into managed #21848

filipnavara commented Jan 7, 2019

jkotas commented Jan 7, 2019 •

edited

Loading

filipnavara commented Jan 7, 2019

filipnavara commented Jan 7, 2019

filipnavara commented Jan 8, 2019

Suchiman commented Jan 8, 2019

filipnavara commented Jan 8, 2019 •

edited

Loading

filipnavara commented Jan 8, 2019

danmoseley commented Jan 8, 2019

filipnavara commented Jan 8, 2019

jkotas commented Jan 9, 2019

filipnavara commented Jan 9, 2019

filipnavara commented Jan 9, 2019

mikedn commented Jan 9, 2019

filipnavara commented Jan 9, 2019

AndyAyersMS commented Jan 9, 2019

filipnavara commented Jan 9, 2019

Suchiman commented Jan 9, 2019

stephentoub commented Feb 6, 2019

filipnavara commented Feb 11, 2019

filipnavara commented Oct 24, 2019

WIP: Move DateTime native code into managed #21848

WIP: Move DateTime native code into managed #21848

Conversation

filipnavara commented Jan 7, 2019

jkotas commented Jan 7, 2019 • edited Loading

filipnavara commented Jan 7, 2019

filipnavara commented Jan 7, 2019

filipnavara commented Jan 8, 2019

Suchiman commented Jan 8, 2019

filipnavara commented Jan 8, 2019 • edited Loading

filipnavara commented Jan 8, 2019

danmoseley commented Jan 8, 2019

filipnavara commented Jan 8, 2019

jkotas commented Jan 9, 2019

filipnavara commented Jan 9, 2019

filipnavara commented Jan 9, 2019

mikedn commented Jan 9, 2019

filipnavara commented Jan 9, 2019

AndyAyersMS commented Jan 9, 2019

filipnavara commented Jan 9, 2019

Suchiman commented Jan 9, 2019

stephentoub commented Feb 6, 2019

filipnavara commented Feb 11, 2019

filipnavara commented Oct 24, 2019

jkotas commented Jan 7, 2019 •

edited

Loading

filipnavara commented Jan 8, 2019 •

edited

Loading